library(tidyverse)
library(plotly)
library(sf)
library(tigris)
library(leaflet)
Here, we load in the necessary datasets. We combine the electric data all into one dataframe, and we do the same with the gas data. We then combine the two into an overall dataframe:
YEARS <- 17:20
QUARTERS <- 1:4
pge_elec <- NULL
for(year in YEARS){
for(quarter in QUARTERS){
if(year != 20 || quarter != 4){
filename <- paste0("PGE_",20,year,"_Q",quarter,"_ElectricUsageByZip.csv")
temp <- read_csv(filename)
pge_elec <- rbind(pge_elec,temp)
}
}
}
pge_gas <- NULL
for(year in YEARS){
for(quarter in QUARTERS){
if(year != 20 || quarter != 4){
filename <- paste0("PGE_",20,year,"_Q",quarter,"_GasUsageByZip.csv")
temp <- read_csv(filename)
pge_gas <- rbind(pge_gas,temp)
}
}
}
Now we convert all units to kBTU’s, and keep the columns that we care about:
pge_elec_clean <-
pge_elec %>%
mutate(TOTALKBTU=TOTALKWH*3.412) %>%
select(!c(COMBINED,TOTALCUSTOMERS,AVERAGEKWH,TOTALKWH))
pge_gas_clean <-
pge_gas %>%
mutate(pge_gas,TOTALKBTU=TOTALTHM*100.066) %>%
select(!c(COMBINED,TOTALCUSTOMERS,AVERAGETHM,TOTALTHM))
Now that both dataframes have matching columns, we can combine them into a new aggregate dataframe:
pge_agg <- NULL
pge_agg <- rbind(pge_elec_clean,pge_gas_clean)
Now, our months are labeled 1->12 regardless of year. We will now make it such that year and month are carried by a date object:
pge_agg_dated <-
pge_agg %>%
mutate(
DATE = as.Date(paste0(pge_agg$MONTH,"/",1,"/",pge_agg$YEAR),"%m/%d/%Y")
) %>%
select(!c(MONTH,YEAR))
Now we create a vector of the Bay Area zip codes:
usa_zips <-
zctas(cb=T, progress_bar = F)
bay_county_names <-
c(
"Alameda",
"Contra Costa",
"Marin",
"Napa",
"San Francisco",
"San Mateo",
"Santa Clara",
"Solano",
"Sonoma"
)
bay_counties <-
counties("CA", cb = T, progress_bar = F) %>%
filter(NAME %in% bay_county_names)
bay_zips <-
usa_zips %>%
st_centroid() %>%
.[bay_counties, ] %>%
st_set_geometry(NULL) %>%
left_join(usa_zips %>% select(GEOID10)) %>%
st_as_sf()
bay_zipcodes <-
pull(bay_zips,GEOID10)
Now we will filter to grab the zip codes that we care about as well as filter the customer classes we care about, and then group and summarize by month, year, customer class, and totalkbtu:
pge_agg_final <-
pge_agg_dated %>%
filter(CUSTOMERCLASS %in% c("Elec- Residential","Elec- Commercial","Gas- Residential", "Gas- Commercial")) %>%
filter(ZIPCODE %in% bay_zipcodes) %>%
group_by(DATE,CUSTOMERCLASS,TOTALKBTU) %>%
summarize(TOTALKBTU = sum(TOTALKBTU,na.rm = T))
Now we begin to construct our plot:
pge_chart <-
ggplot(pge_agg_final) +
geom_bar(aes(x = DATE,y = TOTALKBTU,fill = CUSTOMERCLASS), stat = "identity", position = "stack") +
labs(x = "Month",y = "kBTU",fill = "Electricity Type", title = "Electricity and Gas Consumption from Jan. 2017 to Sept. 2020 \n in the Bay Area")
pge_chart %>%
ggplotly() %>%
layout(
xaxis = list(fixedrange = T),
yaxis = list(fixedrange = T)
) %>%
config(displayModeBar = F)
From this chart, we can see that the aggregate gas and electricity consumption in February 2020 in significantly less than it was in February 2019. The chart indicates that the largest decrease in consumption comes from residential gas. This makes sense, as self-quarantine orders encourages individuals to travel less and hence less gas was used. Beyond this, it seems that seasonal trends take over and it reasonably matches the consumption from the preceding year.
Now, we will create a map showing which neighborhoods had the greatest change in electricity consumption before and after the pandemic. To do this, we will assume that the pandemic in the United States began to take effect in February of 2020. We will then average the electricity consumption from Feb-Sept in 2020, and compare the same average from 2019 in a colored map. In preliminary analyses, I found the commercial and residential data to be different enough to warrant two separate maps.
First, we will compute the relevant data for the residential 2019 average and 2020 average:
pge_19_res_elec <-
pge_agg %>%
filter( CUSTOMERCLASS=="Elec- Residential",YEAR==2019,MONTH %in% 2:9) %>%
mutate(
ZIPCODE = ZIPCODE %>% as.character()
) %>%
group_by(ZIPCODE) %>%
summarize(
AVERAGEKBTU = sum(TOTALKBTU,na.rm = T)/8
)
pge_20_res_elec <-
pge_agg %>%
filter( CUSTOMERCLASS=="Elec- Residential",YEAR==2020,MONTH %in% 2:9,!(ZIPCODE %in% c("96150","96162")) ) %>%
mutate(
ZIPCODE = ZIPCODE %>% as.character()
) %>%
group_by(ZIPCODE) %>%
summarize(
AVERAGEKBTU = sum(TOTALKBTU,na.rm = T)/8
)
Now we make a new dataframe, taking the difference of the 2020 data to the 2019 data:
pge_diff_res_elec <-
pge_20_res_elec %>%
mutate(
DIFFBTU = AVERAGEKBTU - pge_19_res_elec$AVERAGEKBTU
) %>%
select(!c(AVERAGEKBTU))
Now, we are ready to plot this difference as a map in the Bay Area:
pge_diff_bay_res_elec <-
pge_diff_res_elec %>%
right_join(
bay_zips %>% select(GEOID10),
by = c("ZIPCODE" = "GEOID10")
) %>%
st_as_sf() %>%
st_transform(4326)
res_pal <- colorNumeric(
palette = c("red","white","green"),
domain =
pge_diff_bay_res_elec$DIFFBTU
)
leaflet() %>%
addTiles() %>%
addPolygons(
data = pge_diff_bay_res_elec,
fillColor = ~res_pal(DIFFBTU),
color = "white",
opacity = 0.5,
fillOpacity = 0.5,
weight = 1,
label = ~paste0(
round(DIFFBTU),
" kBTU difference in ",
ZIPCODE
),
highlightOptions = highlightOptions(
weight = 2,
opacity = 1
)
) %>%
addLegend(
data = pge_diff_bay_res_elec,
pal = res_pal,
values = ~DIFFBTU,
title = "Resdential difference in kBTU<br>before and after Covid-19"
)
Now we perform the same analysis, but for commericial electricity use:
pge_19_comm_elec <-
pge_agg %>%
filter( CUSTOMERCLASS=="Elec- Commercial",YEAR==2019,MONTH %in% 2:9) %>%
mutate(
ZIPCODE = ZIPCODE %>% as.character()
) %>%
group_by(ZIPCODE) %>%
summarize(
AVERAGEKBTU = sum(TOTALKBTU,na.rm = T)/8
)
pge_20_comm_elec <-
pge_agg %>%
filter( CUSTOMERCLASS=="Elec- Commercial",YEAR==2020,MONTH %in% 2:9,!(ZIPCODE %in% c("96150","96162")) ) %>%
mutate(
ZIPCODE = ZIPCODE %>% as.character()
) %>%
group_by(ZIPCODE) %>%
summarize(
AVERAGEKBTU = sum(TOTALKBTU,na.rm = T)/8
)
Taking the difference:
pge_diff_comm_elec <-
pge_20_comm_elec %>%
mutate(
DIFFBTU = AVERAGEKBTU - pge_19_comm_elec$AVERAGEKBTU
) %>%
select(!c(AVERAGEKBTU))
Now, we are ready to plot this difference as a map in the Bay Area:
pge_diff_bay_comm_elec <-
pge_diff_comm_elec %>%
right_join(
bay_zips %>% select(GEOID10),
by = c("ZIPCODE" = "GEOID10")
) %>%
st_as_sf() %>%
st_transform(4326)
comm_pal <- colorNumeric(
palette = c("red","white","green"),
domain =
pge_diff_bay_comm_elec$DIFFBTU
)
leaflet() %>%
addTiles() %>%
addPolygons(
data = pge_diff_bay_comm_elec,
fillColor = ~comm_pal(DIFFBTU),
color = "white",
opacity = 0.5,
fillOpacity = 0.5,
weight = 1,
label = ~paste0(
round(DIFFBTU),
" kBTU difference in ",
ZIPCODE
),
highlightOptions = highlightOptions(
weight = 2,
opacity = 1
)
) %>%
addLegend(
data = pge_diff_bay_comm_elec,
pal = comm_pal,
values = ~DIFFBTU,
title = "Commercial Difference in kBTU<br>before and after Covid-19"
)
Now that we have the two maps, we can analyze their results. First, note that the color in the residential map is misleading. The white-point is not set at zero, but rather at a negative value; this is simply due to the asymmetric nature of our data range. Thus, mousing over the zip codes reveals that most zip codes actually show an increase in electricity consumption, despite being a light shade of red. This outcome would follow our intuition, since a stay-at-home order would require individuals to use more energy at home, thus increasing consumption of residential electricity.
The commercial consumption map carries the same misleading color scale, since the white point is again a negative value. Mousing over the zip codes reveals that most zip codes show a decrease in commercial electricity consumption. Again, this follows our intuition, as a stay-at-home order would decrease the electricity needs of commercial buildings. Thus, it seems that the pandemic has caused the expected changes to both the residential and commercial electricity consumption trends.
One assumption we made in all our analyses was the cutoff of when we felt the pandemic had taken effect. We chose February 2020 as that cutoff point, but this was done somewhat arbitrarily. We could claim that we see a notable change in gas and electricity consumption in February 2020 on our chart, therefore justifying our cutoff. However, this is somewhat circular since we want to analyze our chart and maps using this same cutoff date. Perhaps a more sound method would be to check when stay-in-home orders became prevalent in the Bay Area, and check if that date correlates with the changes we see in our data.
Likewise, the method we used to construct our maps is somewhat arbitrary. We decided to use the average over a period of months in 2019 and 2020. Although this did capture the expected behavior, we are not sure that these changes are due exclusively to the pandemic or simply an overall increase/decrease in consumption from one year to the next. A more sound but labor intensive analysis could involve comparing the increase/decrease from 2019-> 2020 to the corresponding increase/decrease from 2017->2018, and 2018->2019, etc. This would help us distinguish typical changes from year to year from the changes experienced from the pandemic.